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BLOCK-FLOATING  -POINT  REALIZATION  OF  DIGITAL  FILTERS 


Recently,  the  realization  of  digital  filters  by  means  of  fixed-point  and  floating¬ 
point  arithmetic  have  been  compared  on  the  basis  of  roundoff  noised  In  this  note, 
an  alternative  realization  called  block-floating-point  is  proposed.  In  block -floating-* 
point  arithmetic  the  input  and  filter  states  (i.e.  the  inputs  to  the  delay  registers) 
are  jointly  normalized  before  the  multiplications  and  adds  are  performed  with  fixed- 
point  arithmetic.  The  scale  factor  obtained  during  the  normalization  is  then  applied 
to  the  final  output  to  produce  a  fixed-point  output.  To  illustrate,  consider  a  first- 
order  filter  described  by  the  difference  equation 

y  =  x  +  a ,  y  .  (1) 

-m  n  1  ■’n-l 

To  perform  the  computation  in  a  block-floating-point  manner,  we  define 

An  4P  1  max  f  |xn  I  >  I  yn-1 1  1 1  ^ 

where  c£PTM]  is  used  to  denote  the  largest  integer  power  of  2  which  is  less  than 

or  equal  to  M.  Thus,  A^  represents  the  power-of-two  scaling  which  will  jointly 

normalize  x  and  y  ,  .  We  may  then  write  y  as 
n  Jn-1  J  Jn 


y  -T-  A  x  +  a.  A  y  . 
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or  alternatively  as  either 
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(3) 
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or 


The  representation  of  (4)  is  preferable  to  (3)  since  (3)  implies  that  Yn_j  Is  stored 
in  the  delay  while  (4)  implies  that  A  ^  ^n_i  *s  storec*  the  delay.  Since  is 
always  greater  than  or  equal  to  unity,  A^_^  is  represented  more  accurately 

than  Yn_|  •  A  disadvantage  with  the  representation  of  (4)  is  that  y^  ^  must  first 
be  obtained  to  compute  A^  ,  and  A^/A^_j  must  then  be  obtained.  Equation  (5) 
represents  an  alternative.  Specifically,  we  note  that 
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n-1 

Consequently,  if  we  first  scale  by  A^  then  the  incremental  scaling  can  be 
determined  as  specified  by  (6).  If  we  consider  the  general  case  of  an  Nth  order 
filter  of  the  form 

y  =  x  +  a .  y  .  +  a0  y  0  +  . .  .  4-  a.  y  NT  (7) 

;n  n  1  : n-1  2  7n-2  N  n-N 

then  the  block-floating-point  realization  corresponding  to  (5)  and  represented  in  the 
direct  form  is  depicted  in  Fig.  1.  For  the  general  case 
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In  evaluating  the  performance  of  the  block-floating-point  realization  in  the  presence 
of  roundoff  noise  we  will  restrict  attention  to  first  and  second  order  filters.  Further¬ 
more  we  assume  in  the  analysis  that  is  not  constrained  to  be  a  scaling  by  a 
power  of  two.  Finally,  we  assume  that  for  the  first  and  second  order  case  one  bit 
will  be  provided  in  the  output  register  of  the  adder  for  overflow.  This  will  always 
be  sufficient  for  the  first  order  filter,  and  is  taken  to  be  sufficient  in  a  practical 
sense  for  the  second  order  filter.  Therefore,  for  the  purpose  of  analysis  we 
replace  (8)  and  (9)  by 
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In  the  case  of  a  first  order  filter  a  roundoff  noise  source  is  introduced  in  the 
multiplication  by  A^  ,  the  multiplication  by  a^  ,  and  the  multiplication  by  -j—  . 
Denoting  these  noise  sources  by  f^n>  f  2n  an<^  f3n  resPect^vely>  the  resulting 
output  noise  r>  is,  from  (5), 


^n  =  7T  (<rln+  C2n)+  f3n  +  alVl 
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Assuming  that  fjn>  an<^  f3n  are  independent  from  sample  to  sample,  and  are 

independent  of  each  other  and  -j—  ,  then  7?  =  o  and 
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where  k.  is  the  expected  value  of  (  -r—  as  specified  by  (11).  In  a  similar 
1  An " 

manner,  for  the  second  order  case,  there  are  five  noise  generators  as  depicted  in 

Fig.  2.  Assuming  that  the  noise  generators  are  white,  and  independent  of  each 

2 

other  and  A  ,  and  that  all  the  noise  generators  have  variance  o  , 
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7^  =  CTZ  +  a  (2  +  4r  cos  q  +  2r  )  k0G 
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where  k2  is  the  expected  value  of  and  G  is  given  by 
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To  compare  the  effects  of  roundoff  noise  in  the  block-floating-point  realization 
to  the  effects  in  floating-point  and  fixed-point,  we  consider  the  input  to  be  uniformly 
distributed  white  noise  in  the  range 
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where  h^  is  the  filter  impulse  response.  This  then  guarantees  that  the  output  will 
fit  within  a  register.  With  these  considerations,  the  normalized  output  noise-to- 
signal  ratios  for  the  first  and  second  order  filters  are  respectively 


first  order: 
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second  order: 
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To  compare  (14)  to  the  corresponding  expressions  for  floating-point  and  fixed-point 

we  will  consider  the  high  gain  case  and  approximate  A  as  given  by  (11)  by 

_  1  n 
An  =  2  |y — f  •  Assuming  that  y^  has  a  symmetric  probability  density  about  zero, 

we  then  have  that  k^=4a^.  Representing  a^  as  1  —  6  with  6  small  we  then 

approximate  (14)  by 
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— 4j — j  =  “5"  (block-floating  point)  (16) 
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The  corresponding  approximations  for  floating-point  and  fixed-point  are,  respectively 
(^)  and  (-4).  We  observe,  then,  that  for  this  high-gain  approximation  block- floating¬ 
point  is  approximately  one  bit  worse  than  floating-point  and,  for  the  same  size 
mantissas,  better  than  fixed-point.  Furthermore  as  6-0  the  noise-to-signal  ratio 
for  both  floating-point  and  block-floating-point  increase  at  a  slower  rate  than  fixed- 
point. 

For  the  second  order  case  we  will  restrict  attention  to  a  high  gain  filter 
(r  close  to  one)  and  furthermore  choose  A  small  enough  to  assume  that  A^  ~  — j- 

so  that  k„  ■=r'4cr  ^  ,  Again,  letting  r  =  1-6  ,  we  introduce  the  high  gain  approxima-n 

4  p  y 

tion  G  1Z  - —  •  We  can  approximately  bracket  the  expression 
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by  noting  that  an  upper  bound  is 
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A  lower  bound  is  obtained  by  noting  that  the  sum  of  the  absolute  values  of  an  impulse 
response  is  the  maximum  attainable  output  value  from  a  filter  if  the  maximum  input 
value  is  unity.  Since  the  maximum  output  of  the  second  order  system  at  resonance 

is  - - J72  ’  Provides  a  lower  bound  on  the  sum  of  the  absolute 

values  ^f^tiie  inipulie‘srijsponse.  ”  .  ‘  '  * 

Thus  we  will  consider 


For  the  high  gain  case  this  is  approximately  ^  h-n  g 
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With  these  approximations,  we  have  for  the  second  order  case  that 
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For  comparison,  the  corresponding  expressions  for  the  floating-point  and  fixed- 
point  cases  are: 
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floating-point 
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fixed -point 


(19) 


Consequently  as  in  the  first-order  case  block-floating  is  only  slightly  worse  than 
floating-point  and  better  than  fixed-point.  Again,  as  5  —  o  the  noise-to-signal  ratio 
for  both  floating-point  and  block-floating -point  increase  at  a  slower  rate  than  fixed- 
point.  An  additional  consideration  is  that  (17),  (18)  and  (19)  compare  noise-to-signal 
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ratios  for  equal  size  mantissas.  Floating-point  arithmetic  requires  additional 
bits  in  each  word  to  represent  the  characteristic  while  block-floating-point 
requires  additional  bits  to  represent  the  characteristic  for  the  entire  block.  Thus 
it  is  reasonable  to  speculate  that  in  some  cases  for  the  same  total  number  of  bits 
per  word,  block-floating-point  is  the  least  noisy  realization.  While  it  is  clear  that 
the  implementation  of  block-floating-point  is  more  difficult  than  fixed-point  it  is 
almost  certainly  simpler  than  floating-point.  Thus  block -floating -point  appears  to 
warrant  serious  consideration  as  a  means  for  implementing  digital  filters  with 
hardware  or  on  a  digital  computer  with  limited  word  size. 

An  additional  consideration,  is  that,  in  block-floating-point  final  quantization 
of  the  input  can  be  carried  out  just  before  the  summer.  If  this  is  done,  the  variance 
of  the  output  noise  due  to  input  quantization  is  reduced  by  a  factor  (  -r-  ^  . 
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Fig.  1.  Network  for  block-floating-point  realization  of  an  order  filter. 
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Fig.  2.  Network  for  block-floating-point  realization  of  a  second  order  filter 
including  roundoff  noise  sources. 
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